For this assignment, we are going to investigate fires requiring the fire department to respond. Using data about the locations of firehouses and fires occurring in New York City, we want to know whether response times to fires differ across the city. Second, we will try to focus on one possible variable that could affect response times – the distance from the firehouse – and see whether we find the (expected) effect.
To keep this homework manageable, I am leaving out another part of the investigation: What is the effect of demographic and/or income characteristics of the neighborhood on response times. This is likely a bit more sensitive but also relevant from a public policy perspective.
We rely on two data sets.
NYC Open Data has data on all incidents responded to by fire companies. I have included the variable description file in the exercise folder. The following variables are available:
This dataset is only updated annually, and thus far only data from 2013 to 2018 is contained. The full dataset is also somewhat too large for an exercise (2.5M rows), so I suggest to limit yourself to a subset. I have added a file containing the subset of of only building fires (INCIDENT_TYPE_DESC == "111 - Building fire") for 2013 to 2018 only which yields about 14,000 incidents.
Unfortunately, the addresses of the incidents were not geocoded yet. Ideally, I would like you to know how to do this but am mindful about the hour or so required to get this done. So, here is the code. The geocodes (as far as they were returned successfully) are part of the data (as variables lat and lon).
library(ggmap)
library(tidyverse)
library(dplyr)
# Open "building_fires" file
fire_building <- read_csv("data/building_fires.csv")
NYC Open Data also provides data on the location of all 218 firehouses in NYC. Relevant for our analysis are the following variables: FacilityName, Borough, Latitude, Longitude
firehouses <- read_csv("data/FDNY_Firehouse_Listing.csv") %>%
dplyr::filter(!is.na(Latitude))
Note: 5 entries contain missing information, including on the spatial coordinates. We can exclude these for the exercise.
Provide a leaflet map of the highest severity fires (i.e. subset to the highest category in HIGHEST_LEVEL_DESC) contained in the file buiding_fires.csv. Ignore locations that fall outside the five boroughs of New York City. Provide at least three pieces of information on the incident in a popup.
# subset to the highest alarm
unique(fire_building$HIGHEST_LEVEL_DESC)
## [1] "7 - Signal 7-5"
## [2] "2 - 2nd alarm"
## [3] "1 - More than initial alarm, less than Signal 7-5"
## [4] "3 - 3rd alarm"
## [5] "5 - 5th alarm"
## [6] "4 - 4th alarm"
## [7] NA
## [8] "0 - Initial alarm"
## [9] "75 - All Hands Working"
## [10] "22 - Second Alarm"
## [11] "55 - Fifth Alarm"
## [12] "11 - First Alarm"
## [13] "33 - Third Alarm"
## [14] "44 - Fourth Alarm"
highest_alarm <- fire_building %>%
filter(HIGHEST_LEVEL_DESC=="7 - Signal 7-5"| HIGHEST_LEVEL_DESC=="75 - All Hands Working")
unique(highest_alarm$HIGHEST_LEVEL_DESC)
## [1] "7 - Signal 7-5" "75 - All Hands Working"
head(highest_alarm)
## # A tibble: 6 x 27
## IM_INCIDENT_KEY FIRE_BOX INCIDENT_TYPE_D… INCIDENT_DATE_T… ARRIVAL_DATE_TI…
## <dbl> <chr> <chr> <chr> <chr>
## 1 55672965 2595 111 - Building … 01/01/2013 12:5… 01/01/2013 01:0…
## 2 55673299 2591 111 - Building … 01/01/2013 02:2… 01/01/2013 02:2…
## 3 55674170 1726 111 - Building … 01/01/2013 09:4… 01/01/2013 09:4…
## 4 55675138 0966 111 - Building … 01/01/2013 07:1… 01/01/2013 07:1…
## 5 55675164 7343 111 - Building … 01/01/2013 07:2… 01/01/2013 07:2…
## 6 55675781 5034 111 - Building … 01/02/2013 01:0… 01/02/2013 01:1…
## # … with 22 more variables: UNITS_ONSCENE <dbl>,
## # LAST_UNIT_CLEARED_DATE_TIME <chr>, HIGHEST_LEVEL_DESC <chr>,
## # TOTAL_INCIDENT_DURATION <dbl>, ACTION_TAKEN1_DESC <chr>,
## # ACTION_TAKEN2_DESC <chr>, ACTION_TAKEN3_DESC <chr>,
## # PROPERTY_USE_DESC <chr>, STREET_HIGHWAY <chr>, ZIP_CODE <dbl>,
## # BOROUGH_DESC <chr>, FLOOR <chr>, CO_DETECTOR_PRESENT_DESC <chr>,
## # FIRE_ORIGIN_BELOW_GRADE_FLAG <dbl>, STORY_FIRE_ORIGIN_COUNT <dbl>,
## # FIRE_SPREAD_DESC <chr>, DETECTOR_PRESENCE_DESC <chr>,
## # AES_PRESENCE_DESC <chr>, STANDPIPE_SYS_PRESENT_FLAG <dbl>, address <chr>,
## # lon <dbl>, lat <dbl>
library(leaflet)
library(stringr)
highest_alarm_map <- leaflet(highest_alarm, options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
addProviderTiles(provider = "Esri")%>%
setView( lat= 40.712742, lng=-74.013382, zoom = 10 ) %>%
addCircleMarkers(lng = ~lon,lat = ~lat, radius = 1, popup = ~paste0("Date: ",str_extract(INCIDENT_DATE_TIME, pattern = "[0-9]+/[0-9]+/[0-9]+"), "<br/>", "Address: ", address, "<br/>", "Spread: ", str_sub(FIRE_SPREAD_DESC, 5,-1)))
highest_alarm_map
Start with the previous map. Now, distinguish the markers of the fire locations by PROPERTY_USE_DESC, i.e. what kind of property was affected. If there are too many categories, collapse some categories. Choose an appropriate coloring scheme to map the locations by type of affected property. Add a legend informing the user about the color scheme. Also make sure that the information about the type of affected property is now contained in the popup information. Show this map.
unique(highest_alarm$PROPERTY_USE_DESC)
## [1] "579 - Motor vehicle or boat sales, services, repair"
## [2] "429 - Multifamily dwelling"
## [3] "419 - 1 or 2 family dwelling"
## [4] "700 - Manufacturing, processing"
## [5] "161 - Restaurant or cafeteria"
## [6] "881 - Parking garage, (detached residential garage)"
## [7] "564 - Laundry, dry cleaning"
## [8] "400 - Residential, other"
## [9] "559 - Recreational, hobby, home repair sales, pet store"
## [10] "500 - Mercantile, business, other"
## [11] "549 - Specialty shop"
## [12] "331 - Hospital - medical or psychiatric"
## [13] "131 - Church, mosque, synagogue, temple, chapel"
## [14] "210 - Schools, non-adult, other"
## [15] "519 - Food and beverage sales, grocery store"
## [16] "449 - Hotel/motel, commercial"
## [17] "173 - Bus station"
## [18] "891 - Warehouse"
## [19] "599 - Business office"
## [20] "580 - General retail, other"
## [21] "311 - 24-hour care Nursing homes, 4 or more persons"
## [22] "UUU - Undetermined"
## [23] "888 - Fire station"
## [24] "900 - Outside or special property, other"
## [25] "965 - Vehicle parking area"
## [26] "100 - Assembly, other"
## [27] "960 - Street, other"
## [28] "332 - Hospices"
## [29] "439 - Boarding/rooming house, residential hotels"
## [30] "123 - Stadium, arena"
## [31] "141 - Athletic/health club"
## [32] "931 - Open land or field"
## [33] "962 - Residential street, road or residential driveway"
## [34] "899 - Residential or self-storage units"
## [35] "322 - Alcohol or substance abuse recovery center"
## [36] "170 - Passenger terminal, other"
## [37] "557 - Personal service, including barber & beauty shops"
## [38] "880 - Vehicle storage, other"
## [39] "460 - Dormitory-type residence, other"
## [40] "800 - Storage, other"
## [41] "610 - Energy production plant, other"
## [42] "926 - Outbuilding, protective shelter"
## [43] "000 - Property Use, other"
## [44] "511 - Convenience store"
## [45] "162 - Bar or nightclub"
## [46] "539 - Household goods, sales, repairs"
## [47] "981 - Construction site"
## [48] "180 - Studio/theater, other"
## [49] "183 - Movie theater"
## [50] "459 - Residential board and care"
## [51] "241 - Adult education center, college classroom"
## [52] "152 - Museum"
## [53] "593 - Office: veterinary or research"
## [54] "130 - Places of worship, funeral parlors, other"
## [55] "160 - Eating, drinking places, other"
## [56] "300 - Health care, detention, & correction, other"
## [57] "642 - Electrical distribution"
## [58] "963 - Street or road in commercial area"
## [59] "342 - Doctor, dentist or oral surgeon office"
## [60] "529 - Textile, wearing apparel sales"
## [61] "321 - Mental retardation/development disability facility"
## [62] "340 - Clinics, doctors offices, hemodialysis cntr, other"
## [63] "882 - Parking garage, general vehicle"
## [64] "150 - Public or government, other"
## [65] "174 - Rapid transit station"
## [66] "648 - Sanitation utility"
## [67] "200 - Educational, other"
## [68] "596 - Post office or mailing firms"
## [69] "215 - High school/junior high school/middle school"
## [70] "110 - Fixed-use recreation places, other"
## [71] "592 - Bank"
## [72] "365 - Police station"
## [73] "571 - Service station, gas station"
## [74] "182 - Auditorium, concert hall"
## [75] "581 - Department or discount store"
## [76] "808 - Outbuilding or shed"
## [77] "839 - Refrigerated storage"
## [78] "121 - Ballroom, gymnasium"
## [79] "112 - Billiard center, pool hall"
## [80] "NNN - None"
## [81] "142 - Clubhouse"
## [82] "140 - Clubs, other"
## [83] "569 - Professional supplies, services"
## [84] "124 - Playground"
## [85] "363 - Reformatory, juvenile detention center"
## [86] "974 - Aircraft loading area"
## [87] "464 - Barracks, dormitory"
## [88] "635 - Computer center"
## [89] "211 - Preschool"
## [90] "186 - Film/movie production studio"
## [91] "181 - Live performance theater"
## [92] "134 - Funeral parlor"
## [93] "984 - Industrial plant yard - area"
## [94] "629 - Laboratory or science lababoratory"
## [95] "144 - Casino, gambling clubs"
## [96] "936 - Vacant lot"
## [97] "254 - Day care, in commercial property"
## [98] "155 - Courthouse"
## [99] "807 - Outside material storage area"
## [100] "250 - Day care, other (Conversion only)"
## [101] "615 - Electric-generating plant"
## [102] "213 - Elementary school, including kindergarten"
## [103] "143 - Yacht Club"
## [104] "341 - Clinic, clinic-type infirmary"
## [105] "639 - Communications center"
## [106] "898 - Dock, marina, pier, wharf"
## [107] "952 - Railroad yard"
highest_alarm %>%
group_by(PROPERTY_USE_DESC) %>%
count()%>%
arrange(desc (n))
## # A tibble: 107 x 2
## # Groups: PROPERTY_USE_DESC [107]
## PROPERTY_USE_DESC n
## <chr> <int>
## 1 429 - Multifamily dwelling 4383
## 2 419 - 1 or 2 family dwelling 2974
## 3 500 - Mercantile, business, other 404
## 4 161 - Restaurant or cafeteria 154
## 5 519 - Food and beverage sales, grocery store 135
## 6 599 - Business office 123
## 7 564 - Laundry, dry cleaning 71
## 8 881 - Parking garage, (detached residential garage) 69
## 9 400 - Residential, other 68
## 10 449 - Hotel/motel, commercial 49
## # … with 97 more rows
library(stringr)
highest_alarm['class']=list(str_sub(highest_alarm$PROPERTY_USE_DESC,1,1))
highest_alarm$class[highest_alarm$class == "1"] <- "Assembly"
highest_alarm$class[highest_alarm$class == "2"] <- "Educational"
highest_alarm$class[highest_alarm$class == "3"] <- "Healthcare, Detention and Correction"
highest_alarm$class[highest_alarm$class == "4"] <- "Residential"
highest_alarm$class[highest_alarm$class == "5"] <- "Mercantile and Business"
highest_alarm$class[highest_alarm$class == "6"] <- "Energy Production Plant"
highest_alarm$class[highest_alarm$class == "7"] <- "Manufacturing and Processing"
highest_alarm$class[highest_alarm$class == "8"] <- "Storage"
highest_alarm$class[highest_alarm$class %in% c("9","U","N","0")] <- "Other Property"
highest_alarm$class <- factor(highest_alarm$class, levels= c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"))
highest_alarm %>%
group_by(class) %>%
count() %>%
arrange(desc(n))
## # A tibble: 9 x 2
## # Groups: class [9]
## class n
## <fct> <int>
## 1 Residential 7508
## 2 Mercantile and Business 856
## 3 Assembly 290
## 4 Storage 149
## 5 Other Property 140
## 6 Healthcare, Detention and Correction 65
## 7 Educational 48
## 8 Manufacturing and Processing 40
## 9 Energy Production Plant 8
library(RColorBrewer)
pal <- colorFactor(palette = "Set3", levels = c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"))
property_map <- leaflet(highest_alarm, options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
addProviderTiles(provider = "CartoDB")%>%
setView( lat= 40.712742, lng=-74.013382, zoom = 10 ) %>%
addCircleMarkers(lng = ~lon,lat = ~lat, radius = 1, color = ~pal(class), popup = ~paste0("<b>","Property Type: ", class,"</b>","<br/>","Date: ",str_extract(INCIDENT_DATE_TIME, pattern = "[0-9]+/[0-9]+/[0-9]+"), "<br/>", "Address: ", address, "<br/>", "Spread: ", str_sub(FIRE_SPREAD_DESC, 5,-1)))%>%
addLegend(pal = pal, values = c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"), opacity = 0.8, title = "Property Affected",position = "topleft")
property_map
Add marker clustering, so that zooming in will reveal the individual locations but the zoomed out map only shows the clusters. Show the map with clusters.
cluster_property_map <- leaflet(highest_alarm, options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
addProviderTiles(provider = "CartoDB")%>%
setView( lat= 40.712742, lng=-74.013382, zoom = 10 ) %>%
addCircleMarkers(lng = ~lon,lat = ~lat, radius = 1, color = ~pal(class), clusterOptions = markerClusterOptions(), popup = ~paste0("<b>","Property Type: ", class,"</b>","<br/>","Date: ",str_extract(INCIDENT_DATE_TIME, pattern = "[0-9]+/[0-9]+/[0-9]+"), "<br/>", "Address: ", address, "<br/>", "Spread: ", str_sub(FIRE_SPREAD_DESC, 5,-1)))%>%
addLegend(pal = pal, values = c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"), opacity = 0.8, title = "Property Affected",position = "topleft")
cluster_property_map
The second data file contains the locations of the 218 firehouses in New York City. Start with the non-clustered map (2b) and now adjust the size of the circle markers by severity (TOTAL_INCIDENT_DURATION or UNITS_ONSCENE seem plausible options). More severe incidents should have larger circles on the map. On the map, also add the locations of the fire houses. Add two layers (“Incidents”, “Firehouses”) that allow the user to select which information to show.
#highest_alarm$TOTAL_INCIDENT_DURATION)
summary(highest_alarm$TOTAL_INCIDENT_DURATION)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 870 4907 6803 7308 8918 428335 1
incidents_firehouse_map <-leaflet(highest_alarm, options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
addProviderTiles(provider = "CartoDB")%>%
setView( lat= 40.712742, lng=-74.013382, zoom = 10 ) %>%
addCircleMarkers(lng = ~lon,lat = ~lat, radius = ~TOTAL_INCIDENT_DURATION/5000 , color = ~pal(class), popup = ~paste0("<b>","Property Type: ", class,"</b>","<br/>","Date: ",str_extract(INCIDENT_DATE_TIME, pattern = "[0-9]+/[0-9]+/[0-9]+"), "<br/>", "Address: ", address, "<br/>", "Spread: ", str_sub(FIRE_SPREAD_DESC, 5,-1)), group = "Incidents")%>%
addLegend(pal = pal, values = c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"), opacity = 0.8, title = "Property Affected",position = "topleft")%>%
addMarkers(data = firehouses, lng = ~Longitude, lat = ~Latitude, group = "Firehouses")%>%
addLayersControl(overlayGroups = c("Incidents", "Firehouses"))
incidents_firehouse_map
We now want to investigate whether the distance of the incident from the nearest firehouse varies across the city.
For all incident locations (independent of severity), identify the nearest firehouse and calculate the distance between the firehouse and the incident location. Provide a scatter plot showing the time until the first engine arrived (the variables INCIDENT_DATE_TIME and ARRIVAL_DATE_TIME) will be helpful.
library(sp)
library(sf)
library(raster)
library(rgeos)
# transform firehouse into sf object
firehouse_sf <- st_as_sf(firehouses, coords = c("Longitude", "Latitude"), crs = 4326)
# build an empty list to store the nearest distance
nearest <-list()
# iterate the fire_building and transform them into sp objects
# use st_distance to calculate the distances and find the nearest firehouse
for(i in 1:nrow(fire_building)){
point_sf <- st_as_sf(fire_building[i,], coords = c("lon", "lat"), crs = 4326)
nearest[i] <- min(st_distance(point_sf, firehouse_sf))
}
length(nearest)
## [1] 14200
any(is.na(nearest))
## [1] FALSE
# create a new column in fire_building about the nearest distance from firehouse
fire_building['distance'] <- unlist(nearest)
distance_and_time <- fire_building %>%
dplyr::select(INCIDENT_DATE_TIME, ARRIVAL_DATE_TIME, PROPERTY_USE_DESC,HIGHEST_LEVEL_DESC,BOROUGH_DESC, lon, lat, distance)
head(distance_and_time)
## # A tibble: 6 x 8
## INCIDENT_DATE_T… ARRIVAL_DATE_TI… PROPERTY_USE_DE… HIGHEST_LEVEL_D…
## <chr> <chr> <chr> <chr>
## 1 01/01/2013 12:5… 01/01/2013 01:0… 579 - Motor veh… 7 - Signal 7-5
## 2 01/01/2013 02:2… 01/01/2013 02:2… 429 - Multifami… 7 - Signal 7-5
## 3 01/01/2013 06:2… 01/01/2013 06:2… 429 - Multifami… 2 - 2nd alarm
## 4 01/01/2013 07:5… 01/01/2013 08:0… 429 - Multifami… 1 - More than i…
## 5 01/01/2013 09:4… 01/01/2013 09:4… 429 - Multifami… 7 - Signal 7-5
## 6 01/01/2013 11:0… 01/01/2013 11:1… 429 - Multifami… 1 - More than i…
## # … with 4 more variables: BOROUGH_DESC <chr>, lon <dbl>, lat <dbl>,
## # distance <dbl>
incident_time <- as.POSIXct(strptime(distance_and_time[['INCIDENT_DATE_TIME']], format = "%m/%d/%Y %H:%M:%S %p"))
arrival_time <- as.POSIXct(strptime(distance_and_time[['ARRIVAL_DATE_TIME']], format = "%m/%d/%Y %H:%M:%S %p"))
waiting_time <- arrival_time-incident_time
distance_and_time['waiting_time_secs'] <- as.numeric(waiting_time)
#There are some mistakes in original records like wrong AM/PM
#Transfrom those wrong records (negative numbers) by adding 12 hours back
for(i in 1:nrow(distance_and_time)){
if (!is.na(distance_and_time[i,'waiting_time_secs'])&distance_and_time[i,'waiting_time_secs'] < 0) {
distance_and_time[i,'waiting_time_secs'] <- distance_and_time[i,'waiting_time_secs']+12*60*60
}
}
head(distance_and_time)
## # A tibble: 6 x 9
## INCIDENT_DATE_T… ARRIVAL_DATE_TI… PROPERTY_USE_DE… HIGHEST_LEVEL_D…
## <chr> <chr> <chr> <chr>
## 1 01/01/2013 12:5… 01/01/2013 01:0… 579 - Motor veh… 7 - Signal 7-5
## 2 01/01/2013 02:2… 01/01/2013 02:2… 429 - Multifami… 7 - Signal 7-5
## 3 01/01/2013 06:2… 01/01/2013 06:2… 429 - Multifami… 2 - 2nd alarm
## 4 01/01/2013 07:5… 01/01/2013 08:0… 429 - Multifami… 1 - More than i…
## 5 01/01/2013 09:4… 01/01/2013 09:4… 429 - Multifami… 7 - Signal 7-5
## 6 01/01/2013 11:0… 01/01/2013 11:1… 429 - Multifami… 1 - More than i…
## # … with 5 more variables: BOROUGH_DESC <chr>, lon <dbl>, lat <dbl>,
## # distance <dbl>, waiting_time_secs <dbl>
summary(distance_and_time)
## INCIDENT_DATE_TIME ARRIVAL_DATE_TIME PROPERTY_USE_DESC HIGHEST_LEVEL_DESC
## Length:14200 Length:14200 Length:14200 Length:14200
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## BOROUGH_DESC lon lat distance
## Length:14200 Min. :-74.25 Min. :40.50 Min. : 7.75
## Class :character 1st Qu.:-73.97 1st Qu.:40.67 1st Qu.: 362.06
## Mode :character Median :-73.92 Median :40.72 Median : 561.04
## Mean :-73.92 Mean :40.73 Mean : 666.23
## 3rd Qu.:-73.87 3rd Qu.:40.80 3rd Qu.: 863.99
## Max. :-73.09 Max. :41.60 Max. :101812.17
##
## waiting_time_secs
## Min. : 12.0
## 1st Qu.: 172.0
## Median : 208.0
## Mean : 421.1
## 3rd Qu.: 250.0
## Max. :86816.0
## NA's :23
library(ggplot2)
library(ggthemes)
library(plotly)
df_plot <-distance_and_time %>%
filter(!is.na(waiting_time_secs))%>%
filter(distance < 5000) %>%
filter(waiting_time_secs < 5000)
p <-ggplot(df_plot, aes(x = distance, y = waiting_time_secs))+
geom_point(alpha = 0.5)+
labs(x = "Distance From The Nearest Firehouse (m)", y = "Waiting Time For The First Engine (secs)")+
theme_clean()
p
Now also visualize the patterns separately for severe and non-severe incidents (use
HIGHEST_LEVEL_DESC but feel free to reduce the number of categories). What do you find?
fire_levels <- distance_and_time%>%
filter(!is.na(HIGHEST_LEVEL_DESC))
fire_levels['level']=list(str_sub(fire_levels$HIGHEST_LEVEL_DESC,1,2))
fire_levels$level[fire_levels$level %in% c("7 ", "75")] <- "High Alarm"
fire_levels$level[fire_levels$level %in% c("5 ", "55", "4 ", "44", "3 ", "33")] <- "Medium Alarm"
fire_levels$level[fire_levels$level %in% c("2 ", "22", "11", "0 ")] <- "Low Alarm"
fire_levels$level[fire_levels$level %in% c("1 ")] <- "Undefined Alarm"
fire_levels$level <- factor(fire_levels$level, levels= c("Low Alarm","Medium Alarm","High Alarm","Undefined Alarm"))
fire_levels_plot <- fire_levels %>%
filter(!is.na(waiting_time_secs))%>%
filter(distance < 5000) %>%
filter(waiting_time_secs < 1000)
ggplot(fire_levels_plot, aes(x=waiting_time_secs, y=distance, color = level))+
geom_point(alpha=0.4)+
facet_grid(~ level)+
labs(x = "Waiting Time For The First Engine (secs)", y = "Distance From The Nearest Firehouse (m)")+
theme_bw()
##### b) Map of Response Times
Provide a map visualization of response times. Investigate whether the type of property affected (PROPERTY_USE_DESC) or fire severity (HIGHEST_LEVEL_DESC) play a role here.
response_time <- fire_levels
response_time['class']=list(str_sub(response_time$PROPERTY_USE_DESC,1,1))
response_time$class[response_time$class == "1"] <- "Assembly"
response_time$class[response_time$class == "2"] <- "Educational"
response_time$class[response_time$class == "3"] <- "Healthcare, Detention and Correction"
response_time$class[response_time$class == "4"] <- "Residential"
response_time$class[response_time$class == "5"] <- "Mercantile and Business"
response_time$class[response_time$class == "6"] <- "Energy Production Plant"
response_time$class[response_time$class == "7"] <- "Manufacturing and Processing"
response_time$class[response_time$class == "8"] <- "Storage"
response_time$class[response_time$class %in% c("9","U","N","0")] <- "Other Property"
response_time$class <- factor(response_time$class, levels= c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"))
head(response_time)
## # A tibble: 6 x 11
## INCIDENT_DATE_T… ARRIVAL_DATE_TI… PROPERTY_USE_DE… HIGHEST_LEVEL_D…
## <chr> <chr> <chr> <chr>
## 1 01/01/2013 12:5… 01/01/2013 01:0… 579 - Motor veh… 7 - Signal 7-5
## 2 01/01/2013 02:2… 01/01/2013 02:2… 429 - Multifami… 7 - Signal 7-5
## 3 01/01/2013 06:2… 01/01/2013 06:2… 429 - Multifami… 2 - 2nd alarm
## 4 01/01/2013 07:5… 01/01/2013 08:0… 429 - Multifami… 1 - More than i…
## 5 01/01/2013 09:4… 01/01/2013 09:4… 429 - Multifami… 7 - Signal 7-5
## 6 01/01/2013 11:0… 01/01/2013 11:1… 429 - Multifami… 1 - More than i…
## # … with 7 more variables: BOROUGH_DESC <chr>, lon <dbl>, lat <dbl>,
## # distance <dbl>, waiting_time_secs <dbl>, level <fct>, class <fct>
low <- filter(response_time, level == "Low Alarm")
medium <- filter(response_time, level == "Medium Alarm")
high <- filter(response_time, level == "High Alarm")
pal_level <- colorFactor(palette = c("#f03b20", "#feb24c", "#ffeda0"),
levels = c("High Alarm","Medium Alarm","Low Alarm"))
alarm_map <- leaflet(options = leafletOptions(minZoom = 8, dragging = TRUE)) %>%
addProviderTiles("CartoDB.DarkMatter",options = providerTileOptions(attribution = ""))%>%
addCircleMarkers(data = low, lng = ~lon, lat = ~lat, radius = ~waiting_time_secs/5000, fillOpacity=0.3,
color = ~pal_level(level), group = "Low Alarm", popup=~paste("Alarm Level: ",level,
"<br>Response Time: ", waiting_time_secs," seconds", "<br>Property Type: ", class)) %>%
addCircleMarkers(data = medium, lng = ~lon, lat = ~lat, radius = ~waiting_time_secs/5000, fillOpacity=0.3,
color = ~pal_level(level), group = "Medium Alarm", popup=~paste("Alarm Level: ",level,
"<br>Response Time: ", waiting_time_secs, " seconds", "<br>Property Type: ", class)) %>%
addCircleMarkers(data = high, lng = ~lon, lat = ~lat, radius = ~waiting_time_secs/5000, fillOpacity=0.3,
color = ~pal_level(level), group = "High Alarm", popup=~paste("Alarm Level: ",level,
"<br>Response Time: ", waiting_time_secs, " seconds", "<br>Property Type: ", class)) %>%
setView(lat= 40.712742, lng=-74.013382, zoom = 10) %>%
addLegend(pal = pal_level, values = c("High Alarm","Medium Alarm", "Low Alarm"), opacity = 0.7, title = "Alarm Level",
position = "topleft")%>%
addLayersControl(overlayGroups = c("High Alarm","Medium Alarm","Low Alarm"))
alarm_map
long_response <- subset(response_time, waiting_time_secs > 500)
fireIcons <- icons(
iconUrl = "data/redflame.png",
iconWidth = 15, iconHeight = 15,
iconAnchorX = 7.5, iconAnchorY = 8.5
)
pal_class <- colorFactor(palette = "Tableau10", levels = c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"))
leaflet(options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
addProviderTiles(provider = "CartoDB") %>%
addCircleMarkers(data = response_time, lng = ~lon, lat = ~lat, radius = ~waiting_time_secs/5000, color = ~pal_class(class),
fillOpacity=0.3, popup = ~paste("Property Type: ", class, "<br>Response Time: ", waiting_time_secs," seconds"))%>%
addMarkers(data= long_response,icon = fireIcons,
popup = ~paste("Property Type: ", class, "<br>Response Time: ", waiting_time_secs," seconds"))%>%
setView( lat= 40.712742, lng=-74.013382, zoom = 10 )
According to the plot above, those big circles and fire flame icons are the incidents with a long response time.Clicking the popup, we can see most of them are residential properties. Then we remove those “Residential” records and find some incidents happened in businesses and assembly also had a long response time.
fireIcons2 <- icons(
iconUrl = "data/flame.png",
iconWidth = 15, iconHeight = 15,
iconAnchorX = 7.5, iconAnchorY = 8.5
)
pal_property <- colorFactor(palette = "Spectral", levels = c("Assembly","Educational","Healthcare, Detention and Correction","Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"))
leaflet(options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
addProviderTiles("Esri.WorldImagery", options = providerTileOptions(attribution = "")) %>%
addCircleMarkers(data = subset(response_time, response_time$class != "Residential"), lng = ~lon, lat = ~lat, radius = ~waiting_time_secs/5000, color = ~pal_property(class), fillOpacity=0.7, popup = ~paste("Property Type: ", class, "<br>Response Time: ", waiting_time_secs," seconds"))%>%
addMarkers(data= subset(long_response, long_response$class != "Residential"),icon = fireIcons2,
popup = ~paste("Property Type: ", class, "<br>Response Time: ", waiting_time_secs," seconds"))%>%
addLegend(pal = pal_property, values = c("Assembly","Educational","Healthcare, Detention and Correction","Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"), opacity = 0.7, title = "Property Affected",position = "topleft")%>%
setView( lat= 40.712742, lng=-74.013382, zoom = 10 )
Show a faceted choropleth map indicating how response times have developed over the years. What do you find?
response_borough<-distance_and_time %>%
select(INCIDENT_DATE_TIME,BOROUGH_DESC, waiting_time_secs)
response_borough['borough'] = list(str_sub(response_borough$BOROUGH_DESC,1,1))
response_borough$borough[response_borough$borough == "1"] <- "Manhattan"
response_borough$borough[response_borough$borough == "2"] <- "Bronx"
response_borough$borough[response_borough$borough == "3"] <- "Staten Island"
response_borough$borough[response_borough$borough == "4"] <- "Brooklyn"
response_borough$borough[response_borough$borough == "5"] <- "Queens"
response_borough$borough <- factor(response_borough$borough, levels= c("Manhattan","Bronx","Staten Island","Brooklyn","Queens"))
response_borough['year'] = list(str_sub(response_borough$INCIDENT_DATE_TIME,7,10))
response_borough$year <- factor(response_borough$year, levels= c("2013","2014","2015","2016","2017","2018"))
head(response_borough)
## # A tibble: 6 x 5
## INCIDENT_DATE_TIME BOROUGH_DESC waiting_time_secs borough year
## <chr> <chr> <dbl> <fct> <fct>
## 1 01/01/2013 12:58:10 AM 4 - Brooklyn 160 Brooklyn 2013
## 2 01/01/2013 02:22:56 AM 2 - Bronx 147 Bronx 2013
## 3 01/01/2013 06:20:49 AM 1 - Manhattan 324 Manhattan 2013
## 4 01/01/2013 07:59:48 AM 5 - Queens 225 Queens 2013
## 5 01/01/2013 09:47:27 AM 4 - Brooklyn 118 Brooklyn 2013
## 6 01/01/2013 11:09:16 AM 1 - Manhattan 155 Manhattan 2013
subset(response_borough, response_borough$borough == "Queens"&response_borough$year =="2013")
## # A tibble: 662 x 5
## INCIDENT_DATE_TIME BOROUGH_DESC waiting_time_secs borough year
## <chr> <chr> <dbl> <fct> <fct>
## 1 01/01/2013 07:59:48 AM 5 - Queens 225 Queens 2013
## 2 01/01/2013 07:25:38 PM 5 - Queens 156 Queens 2013
## 3 01/02/2013 01:07:58 AM 5 - Queens 197 Queens 2013
## 4 01/03/2013 03:44:49 AM 5 - Queens 244 Queens 2013
## 5 01/04/2013 04:05:35 PM 5 - Queens 179 Queens 2013
## 6 01/04/2013 11:41:57 PM 5 - Queens 235 Queens 2013
## 7 01/05/2013 04:18:01 PM 5 - Queens 259 Queens 2013
## 8 01/05/2013 05:14:56 PM 5 - Queens 184 Queens 2013
## 9 01/06/2013 01:56:34 PM 5 - Queens 165 Queens 2013
## 10 01/06/2013 07:42:38 PM 5 - Queens 184 Queens 2013
## # … with 652 more rows
average_response_time <-response_borough %>%
filter(!is.na(waiting_time_secs))%>%
group_by(borough, year) %>%
summarise(mean_response_time = round(mean(waiting_time_secs),2))
average_response_time
## # A tibble: 30 x 3
## # Groups: borough [5]
## borough year mean_response_time
## <fct> <fct> <dbl>
## 1 Manhattan 2013 388.
## 2 Manhattan 2014 380.
## 3 Manhattan 2015 398.
## 4 Manhattan 2016 233.
## 5 Manhattan 2017 639.
## 6 Manhattan 2018 236.
## 7 Bronx 2013 498.
## 8 Bronx 2014 920.
## 9 Bronx 2015 397.
## 10 Bronx 2016 411.
## # … with 20 more rows
library(rgdal)
borough <- readOGR("data/borough_boundaries.geojson", verbose=FALSE)
borough@data
## boro_code boro_name shape_area shape_leng
## 0 2 Bronx 1186612478.22 462958.187332
## 1 5 Staten Island 1623756421.89 325960.628294
## 2 3 Brooklyn 1937593022.23 738745.840717
## 3 4 Queens 3045878293.26 904188.424111
## 4 1 Manhattan 636602658.764 361212.476577
shp_response <- borough@data %>%
right_join(average_response_time, by = c("boro_name"= "borough"))
shp_response
## boro_code boro_name shape_area shape_leng year mean_response_time
## 1 1 Manhattan 636602658.764 361212.476577 2013 387.76
## 2 1 Manhattan 636602658.764 361212.476577 2014 380.41
## 3 1 Manhattan 636602658.764 361212.476577 2015 398.38
## 4 1 Manhattan 636602658.764 361212.476577 2016 233.43
## 5 1 Manhattan 636602658.764 361212.476577 2017 638.94
## 6 1 Manhattan 636602658.764 361212.476577 2018 236.01
## 7 2 Bronx 1186612478.22 462958.187332 2013 497.75
## 8 2 Bronx 1186612478.22 462958.187332 2014 920.34
## 9 2 Bronx 1186612478.22 462958.187332 2015 396.86
## 10 2 Bronx 1186612478.22 462958.187332 2016 410.73
## 11 2 Bronx 1186612478.22 462958.187332 2017 228.81
## 12 2 Bronx 1186612478.22 462958.187332 2018 244.08
## 13 5 Staten Island 1623756421.89 325960.628294 2013 231.58
## 14 5 Staten Island 1623756421.89 325960.628294 2014 234.76
## 15 5 Staten Island 1623756421.89 325960.628294 2015 217.37
## 16 5 Staten Island 1623756421.89 325960.628294 2016 226.07
## 17 5 Staten Island 1623756421.89 325960.628294 2017 223.32
## 18 5 Staten Island 1623756421.89 325960.628294 2018 246.95
## 19 3 Brooklyn 1937593022.23 738745.840717 2013 498.28
## 20 3 Brooklyn 1937593022.23 738745.840717 2014 297.12
## 21 3 Brooklyn 1937593022.23 738745.840717 2015 188.05
## 22 3 Brooklyn 1937593022.23 738745.840717 2016 537.03
## 23 3 Brooklyn 1937593022.23 738745.840717 2017 427.31
## 24 3 Brooklyn 1937593022.23 738745.840717 2018 1007.03
## 25 4 Queens 3045878293.26 904188.424111 2013 351.84
## 26 4 Queens 3045878293.26 904188.424111 2014 492.58
## 27 4 Queens 3045878293.26 904188.424111 2015 482.59
## 28 4 Queens 3045878293.26 904188.424111 2016 359.71
## 29 4 Queens 3045878293.26 904188.424111 2017 535.73
## 30 4 Queens 3045878293.26 904188.424111 2018 248.77
borough@data <-shp_response %>%
filter(year =="2013")
summary(borough$mean_response_time)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 231.6 351.8 387.8 393.4 497.8 498.3
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2013 <-borough %>%
leaflet()%>%
addProviderTiles("CartoDB")%>%
addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
addLegend(pal = pal_response, values = ~ mean_response_time, title = "2013",
position = "topleft", opacity=0.7)
borough@data <-shp_response %>%
filter(year =="2014")
summary(borough$mean_response_time)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 234.8 297.1 380.4 465.0 492.6 920.3
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2014 <-borough %>%
leaflet()%>%
addProviderTiles("CartoDB")%>%
addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
addLegend(pal = pal_response, values = ~ mean_response_time, title = "2014",
position = "topleft", opacity=0.7)
borough@data <-shp_response %>%
filter(year =="2015")
summary(borough$mean_response_time)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 188.1 217.4 396.9 336.6 398.4 482.6
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2015 <-borough %>%
leaflet()%>%
addProviderTiles("CartoDB")%>%
addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
addLegend(pal = pal_response, values = ~ mean_response_time, title = "2013",
position = "topleft", opacity=0.7)
borough@data <-shp_response %>%
filter(year =="2016")
summary(borough$mean_response_time)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 226.1 233.4 359.7 353.4 410.7 537.0
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2016 <-borough %>%
leaflet()%>%
addProviderTiles("CartoDB")%>%
addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
addLegend(pal = pal_response, values = ~ mean_response_time, title = "2016",
position = "topleft", opacity=0.7)
borough@data <-shp_response %>%
filter(year =="2017")
summary(borough$mean_response_time)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 223.3 228.8 427.3 410.8 535.7 638.9
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2017 <-borough %>%
leaflet()%>%
addProviderTiles("CartoDB")%>%
addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
addLegend(pal = pal_response, values = ~ mean_response_time, title = "2017",
position = "topleft", opacity=0.7)
borough@data <-shp_response %>%
filter(year =="2018")
summary(borough$mean_response_time)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 236.0 244.1 246.9 396.6 248.8 1007.0
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2018 <-borough %>%
leaflet()%>%
addProviderTiles("CartoDB")%>%
addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
addLegend(pal = pal_response, values = ~ mean_response_time, title = "2018",
position = "topleft", opacity=0.7)
library(mapview)
sync(map2013, map2014, map2015, map2016, map2017,map2018, ncol = 3, sync = "all")
Please follow the instructions to submit your homework. The homework is due on Wednesday, March 25.
If you do come across something online that provides part of the analysis / code etc., please no wholesale copying of other ideas. We are trying to evaluate your abilities to visualize data, not the ability to do internet searches. Also, this is an individually assigned exercise – please keep your solution to yourself.